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Abstract. The paper discusses the reconstruction of potentials for quantum systems at finite temperatures 
from observational data. A nonparametric approach is developed, based on the framework of Bayesian 
statistics, to solve such inverse problems. Besides the specific model of quantum statistics giving the prob- 
ability of observational data, a Bayesian approach is essentially based on a priori information available for 
the potential. Different possibilities to implement a priori information are discussed in detail, including 
hyperparameters, hyperfields, and non-Gaussian auxiliary fields. Special emphasis is put on the recon- 
struction of potentials with approximate periodicity. The feasibility of the approach is demonstrated for a 
numerical model. 
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1 Introduction 

A successful application of quantum mechanics to real 
world systems relies essentially on an adequate reconstruc- 
tion of the underlying potential, describing the forces gov- 
erning the system. The reconstruction of potentials or 
forces from available observational data defines an em- 
pirical learning task. It also constitutes a typical example 
of an inverse problem. Such problems are notoriously ill- 
defined in the sense of Tikhonov In that case ad- 
ditional a priori information is required to yield a unique 
and stable solution. A Bayesian framework is especially 
well suited to include both, observational data and a pri- 
ori information, in a quite flexible manner. 

Inverse scattering theory and inverse spectral 

theory |^,||,|l^,|ll| are two classical research fields which 
deal in particular with the reconstruction of potentials 
from spectral data. Both theories describe the kind of data 



02.50.Rj Nonparametric inference - 02.50.Wp Inference 



which are necessary, in addition to a given spectrum, to 
determine a potential uniquely. In inverse scattering the- 
ory these additional data are for example phase shifts, 
obtained far away from the scatterer. For the bound state 
problems studied in inverse spectral theory these addi- 
tional data may consist of a second spectrum obtained for 
boundary conditions different from those for the first spec- 
trum. The approach of Bayesian Inverse Quantum Me- 
chanics (BIQM) we will refer to in the following is not 
exclusively designed for spectral data but is able to work 
with quite arbitrary observational data jl^ . It can thus be 
easily ada pte d to a large variety of different reconstruction 
scenarios i||,|l4[|l|. 



The basics of a Bayesian framework are summarized 
in Section ||. Setting up a Bayesian approach for a specific 
application area requires the definition of two basic prob- 
abilistic models. First, a likelihood model is needed giving, 
for each possible potential, the probability of the obser- 
vational data. The likelihood model of quantum statistics 
is discussed in Section ^. Second, a prior model has to be 
chosen to implement available a priori information. Prior 
models which are useful for inverse quantum statistics are 
presented in Section ^. Technically the most convenient 
prio r model s ar e Gaussian processes, presented in Section 
fl.l| . Section |4.2|shows how covariance and mean of a Gaus- 



sian process can be related to a priori information about 
approximate sy mme tries of the potentials to be recon- 
structed. Section 4.3 concentrates on approximate period- 
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icity. Section 4.4 on potentials with discontinuities. Prior 
models are m ade more flexible by using hyperparameters 
(Section O), or more general hyperfields^ being function 
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hyperparameters (Section | 4.6|). R elated non-Gaussian pri- 
ors are the topic of Section^J. Having defined likliliood 
and prior models Section ^ discusses the equations to be 
solved for reconstructing a potential. Finally, Section || 
presents numerical applications. 



2 Bayesian approach 

Empirical learning is based on observational data D. In 
particular, we will distinguish "dependent" variables x, 
representing measurement results, and "independent" vari- 
ables O, characterizing the kind of measurement performed. 
In the context of inverse quantum mechanics the latter 
denotes the observables which are measured. Such observ- 
ables may for example be the position, the momentum, 
or the energy of a quantum particle. Variables x and O 
are assumed to be measurable and represent therefore vis- 
ible variables. Observational data will be assumed to con- 
sist of n pairs D — {{xi,Oi)\l < i < n} — {xt,Ot), 
where xt and Ot denote the vectors with components Xi 
or Oi, respectively. Such data will also be called training 
data. In empirical learning one tries to extract a "general 
law" from observations. In this paper the quantum po- 
tential V to be reconstructed will represent this "general 
law" . (Similarly, in the Bayesian reconstruction of quan- 
tum states the object to be reconstructed is the density 
operator of an unknown state [|l6|,0|l^,|l9|.) Potentials, 
considered not to be directly observable, represent in our 
context the hidden or latent variables. We will now use the 
Bayesian framework to relate unobservable potentials to 
observational data. 

The Bayesian approach is a general probabilistic frame- 
work to deal with empirical learning problems ||2C|,pT|,p2|, 
, 25 , 2^,|l^ . Predicting results of future measurements 
on the basis of given training data is achieved by means of 
the predictive probability p{x\0,D) (or predictive density 
for continuous x), which is the probability of finding the 
value X when measuring observable O under the condition 
that the training data D are given. To calculate the pre- 
dictive probability a probabilistic model is needed which 
describes the measurement process. Such a model is spec- 
ified by giving the probability p{x\0, V) of finding x when 
measuring observable O for each possible potential V. As 
p{x\0, V), considered as function of V for fixed x and O, 
is known as likelihood of V, we will call this the likelihood 
model. For inverse quantum problems the likelihood model 
is given by the axioms of quantum mechanics and will be 
discussed in Section 

According to the rules of probability theory the pre- 
dictive probability can now be written as an integral over 
the space of all possible potentials V, 



p{x\0,D)= JdVp{x\0,V)p{V\D). 



(1) 



We note that in Eq.(|i|) we have assumed that the prob- 
ability of X is completely determined by giving potential 
and observable and does not depend on the training data, 
p{x\0, V, D) = p(x|0, V), and that the probability of the 



potential given the training data does not depend on the 
observables selected in the future, p(F|0, Z?) = p{V\D). 
If the set of possible potentials is a space of functions, the 
integral in (^ is a functional integral. 

As the likelihood model is assumed to be given, learn- 
ing consists in the determination of p{V\D), known as the 
posterior for V . To this end, we relate the posterior for V 
to the likelihood of V under the training data by applying 
Bayes' theorem. 



p{V\D) = 



p{xt\Ot,V)p{V) 
p{xt\Ot) 



(2) 



assuming p{V\Ot) = p{V), analogous to Eq. (fj). In the 
numerator of Eq. (||) appears, besides the likelihood, the 
so called prior p{V). This prior gives the probability of V 
before training data have been collected. Hence it has to 
comprise all a priori information available for the poten- 
tial. The need for a prior model, complementing the likeli- 
hood model, is characteristic for a Bayesian approach. The 
denominator in Eq. (H) plays the role of a normalization 
factor and can be obtained from likelihood and prior by 
integration over y as p(xt|Ot) = j dV p{xt\Ot p{V) . 

From a Bayesian perspective learning appears as up- 
dating the probability for V caused by the arrival of new 
data D. If more data become available this process can be 
iterated, the old posterior becoming the new prior which 
is then updated yielding a new posterior. 

In practice, a major difficulty is the calculation of the 
integral over all possible V to get the predictive proba- 
bility (0). Even if one resorts to a discrete approximation 
for X the integral (|l|) is typically still very high dimen- 
sional. The key point is thus to find a feasible approxi- 
mation for that integral. Two approaches are common in 
Bayesian statistics. The first one is an evaluation of the 
integral by Monte Carlo methods ||2|,|^,||,|2|]. The sec- 
ond one, which we will pursue in the following, is the so 
called maximum a posteriori approximation (MAP), being 
a variant of the saddle point method pl|, p4| |30| , pl|, |3^ , 
In MAP one assumes the posterior to be sufficiently 
peaked around the potential V* which maximizes the pos- 
terior, so that approximately 



p{x\0,D) «p(xlO,y*), 



(3) 



with 



V* = argmax^gyp(F|£)) = &vgma.yiy ^^p{xt\Ot ,V)p{V) . 

(4) 

Maximizing the posterior with respect to G V means, 
according to Eq. (|^) with the denominator independent of 
V , maximizing the product of likelihood and prior. 

The Bayesian framework discussed so far can analo- 
gously be applied to a variety of different contexts, includ- 
ing regression, density estimation and classification prob- 
lems . The case of a Gaussian likelihood with fixed vari- 
ance, for example, is known as regression problem, while 
problems with general likelihoods are known as density 
estimation. 
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3 Likelihood model of quantum statistics 

The first step in applying the Bayesian framework to in- 
verse problems of quantum mechanics or quantum statis- 
tics is the definition of the likelihood model . This is 
easily obtained from the axioms of quantum mechanics. 
Consider a system prepared in a state described by a den- 
sity operator p. As our aim will be to reconstruct poten- 
tials V from observational data, we have to choose a p 
which depends on the potential. The probability to find 
value X, when measuring an observable represented by the 
Hermitian operator O, is given by 

p{x\0,V) = Tr(^Po{x)p{V)), (5) 

where Poix) = |x, C| denotes the projector on 
the space of (orthonormalized) eigenfunctions |x, C) of O 
with eigenvalue x and the variable ^ distinguishes eigen- 
functions with degenerate eigenvalues. 

In particular, for a canonical ensemble at temperature 
1//3 (setting Boltzmann's constant equal to 1) the density 
operator reads 

To be specific, we will study in the following Hamiltoni- 
ans of the form H ^ T + V, with kinetic energy T = 
— {l/2m)A, (with Laplacian A, mass m, and setting Ti — 
1) and a local potential 

V{x,x')^v{x)5{x-x'), (7) 

defined by the function v{x). Note that the formalism pre- 
sented in the following works with nonlocal potentials as 
well, numerical calculations, however, would in that case 
be more demanding. For the likelihood models correspond- 
ing to time-dependent quantum systems and to many- 
body systems in Hartree-Fock approximation we refer to 

In the following we will study observational data con- 
sisting of n position measurements Xi . This corresponds to 
choosing the position operator for the observables Oi = x 
with x\xi) = Xi\xi). Hence, for a canonical ensemble, the 
likelihood (|^) becomes for a single position measurement 

P{X,\X,V) = ^Po\(i)a{Xi)\^ = {\(f){Xi)\'^) (8) 
a 

with (non-degenerate) eigenfunctions (pa of H and ener- 
gies i.e., H\(f)a) — Ea\(t)a)- Angular brackets (■■•) 
denote a thermal expectation under the probabilities Pa 
= exp(—(3Ea)/Z with Z = exp(— according to 
Eq. (^. For independent data Di = {xi,Oi), 

n n 

p{xt\Ot, v) = l[p{x,\x, v)=l[{ |0(a;OP ). (9) 

1=1 i=l 

A quantum mechanical measurement changes the state of 
the system, i.e., it changes p. Hence, to obtain indepen- 
dent data under constant p requires the density operator 



to be restored before each measurement. For a canonical 
ensemble this means to wait between two consecutive ob- 
servations until the system is thermalized again. 

Choosing a parametric family of potentials v{x; f ) one 
could now maximize the likelihood with respect to the 
parameters ^, and choose as reconstructed potential 

v*{x) — v{x;^*) with ^* = a,rgma,x^p{xT\OT,v{^)). 

This is known as maximum likelihood approximation and 
works well if the number of data is large compared to the 
fiexibility of the selected parametric family of potentials. 
This method does however not yield a unique optimal po- 
tential if the flexibility is too large for the available number 
of observations. (A possible measure of the "flexibility" of 
a parametric family is given by the Vapnik-Chervonenkis 
dimension or variants thereof.) In such cases, the in- 
clusion of additional restrictions on v in form of a priori 
information is essential. This holds especially for nonpara- 
metric approaches, where each number v{x) is treated as 
individual degree of freedom. Including a priori informa- 
tion generalizes the maximum likelihood approximation of 
Eq. (|I^ to the MAP of Eq. (|). 

4 Prior models 

4.1 Gaussian processes 

A flnite number of observational data cannot completely 
determine a function v{x). Hence, besides observational 
data, additional a priori information is necessary to recon- 
struct a potential in BIQM. In nonparametric approaches 
it is advantageous to formulate a priori information di- 
rectly in terms of the function v{x) itself. A convenient 
choice for a prior is a Gaussian process, 

p{v)= J^dct^y e-5(^-"o|Koh-«o>^ (11) 

where 

(t; - wo I Ko I w - wo) = (12) 

/^^'N.)-..(.)lK.(...')M.')-.o(.')l. 

The function vq is the mean or regression function, repre- 
senting a reference potential or template for v. The inverse 
covariance Kq is a real symmetric, positive (semi)definite 
operator which acts on potentials rather than on wave 
functions and defines a distance measure on the space of 
potentials. For technical convenience one may introduce 
explicitly a factor A multiplying Kq to balance the influ- 
ence of the prior against the likelihood term. A Gaussian 
prior as in Eq. (O) is already a quite flexible tool for 
implementing a priori knowledge. A bias towards smooth 
functions v{x), for instance, can be implemented by choos- 
ing the negative Laplacian as inverse covariance Ko = 
—A. Including higher derivatives in Ko would result in 
even smoother potentials, in the sense that higher deriva- 
tives of v{x) become continuous. For example, a common 
smoothness prior used for regression problems is the Ra- 
dial Basis Function prior Kq = exp (— crpgpZ\/2) 
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4.2 Covariances and approximate symmetries 

Prior information on potentials v can often be related 
to approximate invariance under specific transformations 
ijisl . Typical examples of such transformations are symme- 
try operations like translations or rotations. To be specific, 
assume that a (not necessarily local) potential V com- 
mutes approximately, but not exactly, with some unitary 
operator S, 

V « S^VS = SV, (13) 

which defines an operator S acting on V. In particular, 
we may choose a prior p{V) cx exp{— £^s(y)} with a prior 
energy 

Es^^{V-SV\V-SV) = ^{V\Ko\V). (14) 

This shows that the expectation of an approximate sym- 
metry of V under 5* can be implemented by choosing a 
Gaussian prior with inverse covariance operator 



Ko = (I-S)t(I-S), 



(15) 



where I denotes the identity operator. Symmetry opera- 
tions S{9), with corresponding S{9), may depend on a pa- 
rameter (vector) 6. Approximate invariance under S{9i) 
for several 6i can be implemented by using the sum (or 
integral, for continuous variables) 



^^ = ^E<^-s(0,)^|F-s(^^.)V') 

i 

= lY.(^\Ko{e,)\V). (16) 



Alternatively, one may require approximate symmetry for 
only one value of 0, not fixed a priori. For example, one 
may expect an approximately periodic potential with un- 
known periodicity length 9 which also has to be deter- 
mined from the data. Such 9 are known as hyperparame- 
ters and will be discussed in Section 4.5. 



Lie groups are continuously parameterized transforma- 
tions 

S{9)^e^^^^^% (17) 

where 9i are the real parameters and the = — (the 
superscript denoting the transpose) are antisymmetric 
operators representing the generators of the infinitesimal 
transformations of the Lie-group. We can define a prior 
energy as an error measure with respect to an infinitesimal 
transformation. 



/V -{\ + 9,s,)V V-{l + 9,Si)V 



i 

= Uv\Y.^Is,\V). (18) 



For instance, a Laplacian smoothness prior for a local 
potential v{x) can be related to an approximate symme- 
try under infinitesimal translations. For the group of d— 



dimensional translations which is generated by the gradi- 
ent operator V this can be verified by recalling the multi- 
dimensional Taylor formula for expanding v around x 

S{0)v{x) = eJ^^^'^^vix) = ^ i^i-|ZiL„(a;) = v{x+9). 

(19) 

Up to first order S « IH- 9iVi. Hence, for infinitesimal 
translations, the error measure of Eq. (Hq) becomes 

l^lv-{l + 9,V,)v v-{l + 9,Vi)v^ 



Es = 



1 



= -2(^1^1^)' 



(20) 



assuming vanishing boundary terms. This is the classical 
Laplacian smoothness term. 



4.3 Approximate periodicity 

In this paper we will in particular be interested in poten- 
tials which are approximately periodic. To measure the 
deviation from exact periodicity for a local potential v{x) 
let us define the difference operators 

{Vfv){x) =v{x + e)^v{x). (21) 

{y^v){x) ^v{x)-v{x-e), (22) 

For periodic boundary conditions (Vg)^ = — V^, where 
(Vg )-^ denotes the transpose of Vg . Hence, the operator 

-zi, = -V^V« = (V«)^V« (23) 

defined in analogy to the negative Laplacian, is positive 
(semi)definite, and a possible prior energy is an error term 
which measures the deviation from exact periodicity for 
given period 0, 



-Bs = ^ j dx \v[x) - w(a 



Ae\v). 



(24) 



Discretizing v the operator for periodic boundary con- 
ditions becomes, for example on a mesh with six points 
and 9 — 2, the matrix 



so that 
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-1 
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-1 
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1 











-1 





V 







1 
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2 





-1 





-1 












2 





-1 





-1 






-1 
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-1 












-1 
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-1 






-1 





-1 
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(25) 



(26) 
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As every periodic function with v{x) = v{x + 9) is in 
the null space of Ag typically another error term has to 
be added to get a unique maximum of the posterior. For 
example, combining a prior energy with a Laplacian 
smoothness term yields a Gaussian prior of the form (^l]) 
with inverse covariance Kq = — A(Z\ + ^Aq) and prior 
energy 

Es = -^{v\A + ^Ag\v), (27) 

with weighting factors A, 7. In case the period is not 
known, it can be treated as hyperparameter as will be 
discussed in Section 4.5. Clearly, a nonzero reference po- 
tential vq can be included in Eq. (|27|). In Eq. (p4[), one 
may also sum over several periods 



Es = \'^w{k) [ dx \v{x) - v{x + ke)\'^ , 
^ k=i 



(28) 



where w{k) is a weighting function, decreasing for larger k. 
Prior energies as in ( p8| ) enforce approximate periodicity 
over longer distances than a prior energy of the form ( P4|) . 
The latter, on the other hand, is more robust than ( ^8]) 
with respect to local deviations from periodicity, like a 
locally varying frequency. 

Instead of choosing an inverse covariance Kq with sym- 
metric functions in its null space, approximate symmetries 
can be implemented by using explicitly a symmetric ref- 
erence function vq = Svq for the Gaussian prior ([III). For 
approximate periodicity, this would mean to choose a pe- 
riodic reference potential vq{x) = vq{x + 9) in the prior 
energy Es = ^{v — vo\ Kq \ v — vq) where Kq could be 
for example the identity or a differential operator. Thus a 
periodic reference potential favors a specific form for the 
reconstructed potential, including a specific frequency and 
phase. This is different for the covariance implementation 
( p^ ) of approximate periodicity where only the frequency 
is relevant and reference potentials can still be chosen ar- 
bitrarily. They may, for example be nonperiodic functions 
or functions with even higher symmetry like in Eq. ( p7|) 
where tig = is invariant under all translations. Flexible 



reference potentials will be studied in Section 4.5 



4.4 Potentials with discontinuities 

Smooth potentials v{x) with discontinuities can either be 
approximated by using discontinuous templates vo{x;9) 
or by eliminating matrix elements of the inverse covari- 
ance which connect the two sides of the discontinuity. For 
example, consider the discrete version of a negative Lapla- 
cian with unit lattice spacing and periodic boundary con- 
ditions, 



Kn = -A = 
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Decomposing the matrix ( p9| ) into square roots we write 
Kq = W-^'W (see also Section 4.6) where a possible square 
root is 



W = Vf 



(30) 

Similarly, the derivative operator d/dx represents a square 
root of the negative Laplacian for periodic boundary con- 
ditions. Two regions can now be disconnected by deleting 
all lines of W which have matrix elements in both regions. 
For instance, the first three points in the six-dimensional 
space of Eq. ( pO| ) can be disconnected from the last three 
points by setting W(3, •) and W(6, •) to zero. 
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(31) 



Squaring of W yields a positive semidefinite operator 
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Kn = W = 



/ 

(32) 

resulting in a smoothness prior which is ineffective be- 
tween points from different regions. In contrast to us- 
ing discontinuous templates, the height of the jump at 
the discontinuity has not to be given in advance when 
working with disconnected Laplacians (or other discon- 
nected inverse covariances) . On the other hand training 
data are then required for all separated regions to de- 
termine the free constants which correspond to the zero 
modes of the local Laplacians. The reconstruction of dis- 
continuous functi ons with non-Gaussian priors will be dis- 
cussed in Section 4.7. 



4.5 Hyperparameters 

Parameters of the prior are known as hyperparameters [ p^ 
Like potentials hyperparameters 9 are not di- 
rectly observable and represent hidden variables. In the 
presence of hyperparameters a prior for v can be decom- 
posed as follows 



p{v)^ depiv\9)p{9), 



(29) 



(33) 



where p{9) is known as hyperprior. The likelihood does 
not depend on 9, the predictive probability (|^), however. 
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contains then an integral over 0, 



p{x\0,D) 



(34) 



1 



p{xt\Ot) 



dv d9p{x\0, v) p{xt\Ot, v)p{v\0) piO). 



Like the integral over v, the integral over 9 can be calcu- 
lated either by Monte Carlo methods or in MAP. We re- 
mark that, when a ^-dependent prior is written in terms of 
a corresponding prior energy p{v\6) oc e~^^^'*\ the nor- 
malization J dv e"^*^"!^^ is independent of v but does in 
general depend on 0. 

Hyperparameters 9 can be single numbers or vectors. 
They can describe continuous transformations, like trans- 
lation, rotation or scaling of template functions and scal- 
ing of inverse covariance operators. For real 9 and differ- 
entiable posterior, stationarity conditions can be found by 
differentiating the posterior with respect to 9. 

Instead of continuous transformations of templates or 
inverse covariances one can consider a finite collection of 
alternative reference potentials Vi or alternative inverse co- 
variances Ki. For example, a potential to be reconstructed 
may be expected to be similar to one reference poten- 
tial out of a small number of possible alternatives Vi. The 
"class" variables i are then nothing else but hyperparam- 
eters 9 with integer values. 

Binary parameters allow to select from two reference 
functions or two inverse covariances that one which fits 
the data best. Indeed, writing 



vo{9) = {l-9)vi+9v2, 
Ko(0) = (l-0)Ki+0K2, 



(35) 
(36) 



a binary 9 € {0, 1} implements hard switching between al- 
ternative templates or inverse covariances, corresponding 
to a conditional prior 



with 



piv\9) oc 



^i(^) = 2^'^ ~ "^^ ' "'^^ ' ~ 

E2{v) = i(w - W2 I K2 I W - V2). 



(37) 

(38) 
(39) 



Similarly, a real 9 G [0, 1] in (^ or (|3^) yields soft mixing. 
In that case, however, the mixing of templates in (pq) is 
not equivalent to a mixing of prior energies as in (p7D 
because for real 9 Eqs. (|35| ) and (^6|) lead to mixed terms, 
like (1 - 9)9{v - ui I Ko 1 - V2)/2 for Ki = K2. When 9 
takes integer values the integral / d9 becomes a sum J2e 
so that prior, posterior, and predictive probability have 
the form of a finite mixture with components 9 p6| . 

For a moderate number of components one may be able 
to include all of the mixture components in the calcula- 
tions. If the number of mixture components is too large 
one must select some of the components, for example by 
creating a random sample using Monte Carlo methods, 
or by solving for the 9* with maximal posterior. In con- 
trast to typical optimization problems for real variables. 



the corresponding integer optimization problems are usu- 
ally not very smooth with respect to 9 (with smoothness 
defined in terms of differences instead of derivatives), and 
are therefore often much harder to solve. 

There exists a variety of deterministic and stochastic 
integer optimization algorithms, which may be combined 
with ensemble methods like genetic algorithms |37|, ^|39| , 
and with homotopy methods like simulated an- 
nealing |^,^|4^,E^, 46j. Annealing methods are similar 
to (Markov chain jMonte Carlo methods, which aim at 
sampling many points from a specific distribution (i.e., for 
example at fixed temperature). For Monte Carlo methods 
it is important to have (nearly) independent samples and 
the correct limiting distribution for the Markov chain. For 
annealing methods the aim is to find the correct minimum 
by smoothly changing the temperature from a finite value 
to zero. For the latter it is thus less important to model 
the distribution for nonzero temperatures exactly, but it is 
important to use an adequate cooling scheme for lowering 
the temperature. 



4.6 Hyperfields 

The hyperparameters 9 considered so far have been real 
or integer numbers, or vectors with real or integer com- 
ponents 9i. In this section we will discuss priors parame- 
terized by functions, called hyperfields jl^, resulting in a 
still larger flexibility of the formalism. In numerical calcu- 
lations where functions have to be discretized hyperfields 
stand for high dimensional hyperparameter vectors. 

Using hyperfields one has to keep in mind that a gain 
in flexibility at the same time tends to lower the influence 
of the prior. For example, consider as hyperfield a com- 
pletely adaptive reference potential 9{x) = vq{x) within 
a Gaussian prior (pl|). Then, for any v{x) the prior en- 
ergy vanishes for vq{x) ~ v{x). In the absence of addi- 
tional hyperpriors p(f)) the corresponding MAP solution 
for the hyperfield 9{x') — wofx) is thus 9*{x) = w(a;) for 
which the Gaussian prior (O) becomes uniform in v(x). 
Hence the price to be paid for the additional flexibility 
introduced by hyperfields are weaker priors and a large 
number of additional degrees of freedom. This can con- 
siderably complicate calculations and requires sufficiently 
restrictive hyperpriors for the hyperfields. 

Let us define local hyperfields 9(x) to be hyperfields de- 
pending on the position variable x. (In general hyperfields 
can be introduced which depend on other real variables 
or on several position variables.) Local hyperfields can be 
used, for example, to adapt templates or inverse covari- 
ances locally. To this end, we express real symmetric, pos- 
itive (semi)definite inverse covariances by square roots or 
(real) filter operators W, so that 



Ko = W^W. 



In components 



Ko(.,.')^/^."W-(.,.")W(.",.'), 



(40) 



(41) 
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and therefore 

{v - vo\Ko\v - vq) 



dxdx' dx" [v{x) — Vo{x)\ 

X W^(a;,a;')W(x',a;") 
X [v{x") - vo{x")\ 

dx \uj{x)\^ , 



where we define the filtered difference 

Lo{x)= fdx'W{x,x')[v{x') ~vo{x')]. 



(42) 



(43) 



For instance, a square root (|30| ) of the discrete negative 
Laplacian (^9|) corresponds for wq = to a filtered differ- 
ence uj{x) = v{x + 1) — v{x). 

The exponent of a Gaussian prior for a local potential 
V can thus be written as an integral over x, 

p(v) oce-^("); E{v) = ^Jdx\u;{x)\\ (44) 

In contrast to Eqs. (|3^) and ( |3^ ) the representation (^) is 
well suited for introducing local hyperfields. For instance, 
an adaptive prior 



piv\e) 



-E(v\e) 



(45) 



with a real local hyperfield 0{x) e [0, 1] can be obtained 
by mixing locally two alternative filtered differences 

Lu{x-e) = [1 - 9{x)]uJi{x) + e{x) UJ2{x), (46) 

where the two uJi may differ in their filters and/or refer- 
ence potentials. In that case the hyperfield 6{x) can locally 
select the best mixture of the filtered differences uji, i.e., 
that one which yields in (^) the largest probability or 
smallest prior energy 



- 2 



E{v\9) ^ ^ I dx\ujix;0)\^ +liiZv{0) 

[1 - 6{x)]uji{x) + e{x)uj2{x] 
Here the normalization factor 

Zvi9) = [ d;e-^/'*-K-;«)l' 
Jvev 



(47) 
\nZv{9). 

(48) 



depends in general on 6 if the filters of the Ui differ. 
Clearly, allowing an unbounded — cxd < 9(x) < oo any 
function uj{x;9) can be written in the form of Eq. (^6|), 
provided lui{x) ^ uj2{x) for all x. 

In contrast to soft mixing with real functions 9{x) 
a binary local hyperfield 9{x) G {0,1} implements hard 
switching between alternative filtered differences. Since in 
the binary case 9^ = 9,{1- 9f = {l-9), and 9{1 -9) = 
0, Eq. (^) becomes [compare Eq. (|^] 

E{v\9) = i jdx([l-9{x)]\oj^{x)\^ 

+9{x)\uj2{x)\'')+\nZv{9), (49) 



while for real 9{x) Eq. ( [47D includes a mixed term in uoiUJ2- 
It is sometimes helpful to transform an unrestricted real 
hyperfield —oo<g{x) < oo into a bounded real hyperfield 

9{x) e [0, 1] by 



9{x)=aig{x)-d), 
with threshold •& and sigmoidal transformation 
1 1 



a[x 



1 + e- 



-(tanh(i^a;) + 1). 



(50) 



(51) 



In the limit ly oo the transformation <t[x) of ( pl| ) ap- 
proaches the step function 0{x) and ( ^0[ ) results in a bi- 
nary 9{x) = 0{g{x) - z9) e {0, 1}. 

Analogous to the global mixing or global switching in 
Eq. ( |35| ) and Eq. (^6|), the alternative filtered differences 
uJi{x) at position x in Eq. (^) can be constructed by lo- 
cal mixing or switching between template functions vi{x'), 
V2{x') or filters Wi(a;,a;'), W2(a;,a;') using a local hyper- 
field 9{x), 

v,{x'-9) = [I - 9{x)]vi{x') + 9{x)v2{x'), (52) 
W{x,x'-9) = [I- 9{x)]Wi{x,x')+ 9{x)'W2{x,x').{bZ) 

It is important to note that the local templates or ref- 
erence potentials Vx{x'\9) are functions of x' and x. In- 
deed, to obtain a filtered difference u>[x;9) at position x, 
a reference function Vx is needed for all x' for which the 
corresponding W(a;, x') is nonzero, since 



uj{x\9) = / dx'W{x,x')[v{x')-Vxix';t 



(54) 



In this way the whole template function Vx{x';9), rather 
than individual function values vq{x,9), is adapted indi- 
vidually for every local filtered difference. In particular, 
the local reference potentials of Eq. (|5^) have to be dis- 
tinguished from one global, locally adapted reference po- 
tential 



vo{x'; 9)^ [I- 9{x')] vi{x') + 9{x') V2{x'), 



(55) 



which at first glance seems to be the natural generalization 
of Eq. (|3^) to local hyperfields. Only in Gaussian prior 
terms with the identity I as covariance, local template 
functions Vx{x' , 9) are not required. In that case Vx{x'; 9) is 
only needed for x = x' and we may directly write Vx{x' ] 9) 
= vo{x';9), skipping the variable x, and obtain the prior 
energy 



i Jdx \uj{x; 



\' ^l{v~MO)\v-iom. (56) 



We remark that one can also generalize Eq. (|5^), which 
uses the same vi{x'), V2{x') for all x, by working with 
reference potentials vi^x{x'), V2,x{x') which vary with the 
position X at which the filtered difference lli{x) is required. 
This yields 



vx{x'; 9)^[1~ 9{x)] vi^x{x') + 9{x) V2,x{^') 



(57) 



8 



J. C. Lemm et al: Bayesian Reconstruction of Approximately Periodic Potentials at Finite Temperature 



For binary i 
variance 



Eq. (m) corresponds to an inverse co- 



Ko{9)= Jdx K,(0)= JdxW^ie)W^ 



[0) 



dx ([1 - e{x)]Wi^,Wl, + 9{x)W2,^Wl,) (58) 



(59) 



with 

written as dyadic product of the vector Wx{0) — W(x, • ; 9) 
and with analogously defined Wi,x = Wi(a;,-). For 9- 
dependent inverse covariances the normalization factors 
Zv(6') become 6'-dependent. They have to be included 
when integrating over 9 or solving for the optimal 9 in 
MAP. 

In Eqs. dH) and (^ it is straightforward to introduce 
two binary hyperfields 9,9', one for the reference potential 
Vx and one for the filter W. This results in a conditional 
prior 

piv\9,9') oc e-i/'^^("-^xWIK.(e')|f-'".(e)> 

^ ^-k Jdx\uj(x;e.e')\^ ^ (go) 

Here we can write 



dx \io{x; 9, 9'){' ^{v- vo{9, 9') \ Ko(0') | v - vo{9, 9')) 



dx{vx{9)\Kx{9')\vx{9)) 
-{vo{9,9')\Ko{9')\v^{9,9')), (61) 
with an effective template vo{9, 9') given by 

vo{9,9') = Ko(0')"' [dxKx{9')vx{9), (62) 



and effective inverse covariance Ko(0') = Jdx'Kx{9') as 
in Eq. (pq). Since the last two terms in Eq. (EW are v- 
independent constants (only depending on 9, W) we see 
that for fixed hyperfields this prior is minimized hy v = 
vo{9,9'). For given hyperparameters 9, 9' we can write 
p(v\9,9') oc e~^'^"l^'^ ) with a prior energy of the form 
E{v\9,9') = l{v-vo{9,9')\Ko{9')\v-vo{9,9')). 

As the product of Gaussians is again a Gaussian sev- 
eral Gaussian prior factors can easily be combined. In this 
way one can implement a nonlocal property like smooth- 
ness and still avoid local template functions Vx(x' , 9) by 
combining a Gaussian prior with Kq = I as in (56) with a 
Gaussian prior with nondiagonal covariance and zero (or 
fixed) template, 



1 



1 



E{v\9) ^^{v~ MO) I « - ^o(^)) + -{v\K\v). (63) 
Combining both terms yields 

^(^'l^) = l({v~vo{9)\Ko\v-vo{9)) 



vo{9)\I-K,'\iom], (64) 



with the second term being independent of v and with 
effective template and effective inverse covariance 



va{9)=K^'vo{9), Ko = I + K. 



(65) 



For differential operators Kg the effective vq{9) is thus a 
smoothed version of vo{9). 

The extreme case would be to treat vq and W itself 
as unrestricted hyperfields. As already discussed, this just 
eliminates the corresponding prior term. Hence, to restrict 
the flexibility, typically a smoothness hyperprior may be 
imposed to prevent highly oscillating functions 9{x). For 
real 9{x), for example, a smoothness prior like a Lapla- 
cian prior {9\ —A\9)/2 can be used in regions where it is 
defined. (The space of functions for which a smoothness 
prior with discontinuous templates is defined depends on 
the locations of the discontinuities.) An example of a non- 
Gaussian hyperprior is 



p{9) 



where r is a constant and 



Jdx Co(x) 



(66) 



(67) 



with a sigmoid <j{x) as in (pT|). For ly —> oo the sigmoid 
approaches a step function and Cg{x) becomes zero at 
locations where the square of the first derivative is smaller 
than a certain threshold Q < < oo, and one otherwise. 
For discrete x one can analogously count the number of 
jumps larger than a given threshold. One can then penalize 
the number Nd{9) of discontinuities where {89/ dx)^ — oo 
and use 

p{9) oc e-^^^'W. (68) 

In the case of a binary field this corresponds to count- 
ing the number of times the field changes its value. The 
expression Cg of Eq. (|67|) can be generalized to 



Cgix) = a {\u;g{x)\^ - dg) , 



(69) 



where, analogous to Eq. (43) 



u;g{x)= dx'Wg{x,x'Mx')-tg{x% (70) 



with template tg{x') representing the expected form for 
the hyperfield, and a filter operator defining a distance 
measure for hyperfields. Parameters of the hyperprior like 
T in Eq. ( |66| ) or Eq. (^8|) can be treated as higher level 
hyperparameters. 



4.7 Non— Gaussian priors and auxiliary fields 

As an alternative to introducing hyperfields 9{x) one can 
work with priors which are explicitly non-Gaussian with 
respect to v. This can be done by introducing auxiliary 
fields B{x; v) whose function values are not considered as 
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independent variables but are directly defined as function- 
als of V. (For the sake of simplicity we will for B{x; v) also 
write B(x) or B{v), depending on the context.) Like hy- 
perfields, auxiliary fields can select locally the best adapted 
filtered difference from a set of alternative w^. 

For instance^ consider the auxiliary field [compare with 
Eqs. M) or 



where 



B{x) = a (u(x) - 1?) 



u{x) = |t^i(a;)P - |w2(a;)|' 



(71) 



(72) 

1? represents a threshold, cr(x) a sigmoidal function as in 
(^), and the w,; are filtered differences defined in terms of 
V according to Eq. (^3|). Again a binary field B{x) is ob- 
tained by letting the sigmoid approach the step function. 
Because the uji depend on w, it is clear from the definition 
( [n] ) that the auxiliary field B{x) is no independent hy- 
perfield but has values being functionals of v. Notice that 
B{x) is nonlocal with respect to v(x) if uji(x) is nonlocal; 
a value B{x) then depends on more than one w(a;)-value. 
For a negative Laplacian prior in one-dimension Eq. ( [7l|) 
reads, 



B{x) 



d{v — vi) 



dx 



d{v - V2) 



dx 



(73) 



While auxiliary fields B{x) are directly determined by v, 
hyperfields are indirectly coupled to v through the MAP 
stationarity equations. Conversely, an auxiliary field B{x) 
can be treated formally as independent hyperfield if a La- 
grange multiplier term A [B{x) — a {u(x) — "(9)] is added to 
the prior energy in the limit A — > 00. 

Like hyperfields 6{x) auxiliary fields B{x) can be used 
to adapt reference potentials vq or filters W. However, a 
prior as in Eq. ( [ll| ) is non-Gaussian with respect to v if 
vo{B) and 'Ko{B) depend on B and thus also on v. Fur- 
thermore, analogous to hyperpriors p{9), additional prior 
terms p{B{v)) oc exp{—EB{v)) for v can be included, for- 
mulated in terms of an auxiliary field B(x). As in Eq. ( ^9| ) 
a binary B{x) can switch between two filtered differences 

\^{x-B)\'' = [1 -B(x)]|wi(x)|2 + B(x)|c^2(a;)P, (74) 

within a (non-Gaussian) prior for v 

p(t;) oce^^(")^^«(^), (75) 

where the normalization factor Z — Jdv e~-^''^^~-^^''^^ of 
( [75| ) is by definition independent of v. Hence it can be 
skipped for MAP calculations also for non-Gaussian p{v). 
In Eq. 



E{v) = - Jdx {[1- B{x)]\uji{x)\^ + B{x)\lu2{x)\^) , 

_ (76) 
according to Eq. (|74|), while Eb{v) depends on v only 
through B{v). For example, the number of switchings can 
be restricted by taking 



where Nd{B) counts the number of discontinuities of B(x). 
Other choices, for real B{x), are quadratic energies 



(78) 



or non-quadratic energies of the form 

Eb{v) = ^ JdxCsix) (79) 

where, similar to (|69|), 

CB{x)^a{\ujB{x)\^ -de). (80) 

and 

u;Bix)= Jdx'WB{x,x')[Bix') -tBix% (81) 

is a filtered difference of B with filter operator and 
template tB- 

Let us compare a non-Gaussian prior built of prior 
energies (^ and (|7^) for a binary auxiliary field ( |7l] ) 

p{v) CX e"^ Jdx{[l-Bix)]\uj,ix)\^ + B(x)Mx)f)-iN^(B) 

(82) 

with the similar-looking combination of Gaussian prior 
( ^9|) with hyperprior (68) for a binary hyperfield. 



p{v,e) ^p{v\e)p{9) (X (83) 
Jdx [(i-e{x))\uji{x)\^+e(x)\u,2ix)\'^]-iNi{e)~\n Zv(e) 

Eq. ( |83| ) works with conditional probabilities p{v\6), hence 
the corresponding normalization factors are in general 9- 
dependent and have to be included for MAP calculations. 
Typically, MAP solutions for B, Nd{B) and Cb being di- 
rectly defined in terms of the corresponding MAP solution 
for V are different from the MAP solutions for 6*, Nd{9) and 
Co, respectively. However, if the filtered differences Ui in 
Eq. ( |83| ) differ only in their templates, the normalization 
term can be skipped. Then assuming = 0, p{9) oa 1, 
p{B) oc 1 the two equations are equivalent for 9{x) = 
(|wi(a;)|^ — \uj2{x)\'^). In the absence of hyperpriors, it 
is indeed easily seen that this is a selfconsistent solution 
for 9 for every given v. In general, however, when hyperpri- 
ors are included, another solution for 9 may have a larger 
posterior. 

Hyperpriors p{9) or additional auxiliary prior terms 
p{B) can be useful to enforce specific global constraints 
for 9(x) or B{x). In natural images, for example, discon- 
tinuities are expected to form closed curves. Priors or hy- 
perpriors, organizing discontinuities along lines or closed 
curves, arc thus important for image segmentation or im- 
age restoration |^,^,^,50,5l|. A similar method has 
been used in the determination of piecewise smooth re- 
laxation time spectra from rheological data . 

Another useful class of non-Gaussian priors generaliz- 
ing (H) has the form [Ellslg 



Eb{v) = MB), 



(77) 



p{v) OC e 2 J \ Ji ^ 



(84) 
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Fig. 1. Example of a non-quadratic "cup" -function ^{x) — 
a(1.0 - 1/(1 + {\x - xol/bY')), with a= 5, 6 = 10, 7 = 0.7, xo 
= 0. 



where ip is a, non-quadratic function. This function ip can 
be fixed in advance for a given problem or adapted using 
hyperparameters. Typical choices to allow discontinuities 
are symmetric "cup" functions with minimum at zero and 
flat tails for which one large step is cheaper than many 
small ones (see Fig. ^. 

Table |l] summarizes the basic variants of prior energies 
discussed in the paper. 



Gaussian prior 


E{v) = i (t; - wo 1 Ko j w - uo) 


( 


LI 


) 


with hyperparameter 6 


E{v\e) = i^{v~vi\Ki\v~vi) 
+1(11 - ■y2 K2 « - V2) 


(§) 


with local hyperfield 0{x) 


E{v\e) = i /dx (^[1 - e{x)]\Loi(x)f 

+e{x)\uj2{x)\^^ + \nZv{e) 
E{v\e) = ^{v- vo{&) 1 y - «o(e)> + ^{v\K\v) 


(§) 
(§) 


Non-Gaussian prior with auxiliary field B{x;v) 


E{v) = i/dx {[1 - B{x)]\u;^{x)f + B{x)\co2{x)f) 


( 


76 


) 



Table 1. Summary of basic prior energy variants discussed in 
this paper. 



5 Stationarity equations 

To reconstruct a local potential v in MAP we have to 
maximize the posterior p{v\D) with respect to v. If the 
functional derivative of the posterior with respect to v 
exists, the reconstructed potential can be found by solving 
the stationarity equation 



S^liip{v\D) = 0, 



(85) 



where we have chosen the logarithm for technical con- 
venience, and denotes the functional derivative with 
respect to v. 



For observational data consistingof n independent po- 
sition measurements the posterior (0) reads 



p{v\D) oc p{v)Y\_p{xi\x,v). 



(86) 



To formulate the stationarity equation (85) we have to 
calculate the functional derivatives of likelihood and prior. 
For inverse quantum statistics ||l2| the likelihood for posi- 
tion measurements (|^) on a canonical ensemble (^) de- 
pends on the eigenfunctions and eigenvalues of the v— 
dependent Hamiltonian H{v). We thus have to find the 
functional derivatives of the eigenfunctions 0q. and eigen- 
values Ea ■ Those can be obtained by taking the functional 
derivative of the eigenvalue equation H\(j)a) — Ea\(j)a), 
where we will assume the eigenfunctions to be orthonor- 
malized. Choosing {(/)a \ 5y(x)4'a) — and utilizing 

5,^^,)H{x', x") = (5„(,)F(x', x") = 5{x - x')5ix' - x"), 

(87) 

we find for nondegenerate eigenfunctions 

Sv{x)Ea = (0Q I I 0q) = \4'a{x)\'^, (88) 

^v{x)<l>a{x') = ^ -—^—(l)^{x')(f>*{x)(f>a{x). (89) 

It follows for the functional derivative of the likelihood 

Sv{x)P{Xz\x, v) = ( {5y(x)(l)* (Xi)) (j){Xi) ) 

+ {(j)*{xi)6^^^)(l){x^)) 
-f3[mx,)mx)\') 

-mx.)\')mx)\^) 



(90) 



Having obtained Eq. (^^ for the likelihood we now 
have to find the functional derivative of the prior. For the 
Gaussian prior ( [Til ) one gets directly 



Sv Inp(w) = -Ko{v - vq). 



(91) 



If hyperparameters 9 are included and treated in MAP 
(i.e., not integrated out by Monte Carlo techniques), the 
posterior has to be maximized simultaneously with respect 
to V and 9. We have already mentioned that 0-dependent 
inverse covariances lead to normalization factors which are 
independent of v but depend on 9. Such factors have to 
be included when maximizing with respect to 9. 

As a non-Gaussian example consider a prior where two 
filtered differences are mixed by an auxiliary field B{x) 
and an additional prior factor p{B) is included, for ex- 
ample to prevent fast oscillations of B{x). With B{x) — 
<j(u(x) — "i?), threshold d, sigmoidal function a{x) as in 
Eq. (pl|), and u{x) = |ijJi(x)P — |w2(a;)P this gives 

1 r I P 

J3(u) oc J "^a; |[l-B(x)]i^i(2;)-|-B(3:)(^2(a;)| -Eb ^g2^ 

Analogous to Eq. ([75[), the term 

Eb = j dxEeix), (93) 
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represents an auxiliary prior energy formulated in terms 
of the mixing function B{x). Like lij{x) the function value 
Eb{x) may depend on the whole function B and not nec- 
essarily only on the function value B{x). Using uji{x) — 
{x I Wi(w — Vi)) we find 

<5„(^)u;j(a;') = W,(a;',a;), (94) 

and thus 

8,(^)U{X') = 2 (Wf X') L0^{X') - W^{X, X') L02{X')) . 

(95) 

Furthermore, we obtain for the functional derivative of Eb 
5,(^,)Eb{x') = j dx" [<5,„(,)i?(x")] [<5B(x")^^B(a;')] , (96) 

where with Eq. ( [zi] ) 

5„(,)B(a;") = a'iu{x") - i?)5,(,)u(a;"), (97) 



and (t'(m) = da{u)/du. For a prior energy as in ([78|) which 
is quadratic in B{x) 



Eb(x) = \ujb{x)\^, 



(98) 



u}b{x) defined in Eq. (|8l|), the functional derivative with 
respect to B{x) becomes 

5b(x)Eb{x') = 2Wl{x, x') LUBix'). (99) 

For a non-Gaussian prior with energy ( [79| ) an additional 
derivative of the sigmoid appears. Now all terms can be 
collected and inserted into the functional derivative of the 
prior (§S 



Sy Inp(u) = - I dx{^[[l- B{x)]uJi{x) + B{x)lu2{x)] 

X ([1 - B{x)]SyUJi{x) + B{x)SvUJ2{x) 

+ 5yB{x)[uj2{x) - LOlix)]) 
+ SyEB{x)). (100) 



The Bayesian approach to inverse quantum mechanics 
is not restricted to position measurements, but allows to 
deal with all kinds of observations for which the likelihood 
can be calculated. To have better information about the 
depth of a potential it is useful to include information on 
the ground state energy of a system. For instance, includ- 
ing a noisy measurement of the average energy 



(101) 



yields an additional factor in the posterior of the form 

-E„ r. 



In the noise free limit — > cx) this yields U ^ k. 



(102) 



Calculating the functional derivative of U with respect 
to a local potential 

^.(.)C/ = ( Sy(^,)E )-l3{E 5y(^,)E) +p{E){ S^^^^E), 

(103) 

it is straightforward to obtain 

Sy^^^Eu ^ ^i{U-K){\(t>ix)\^[^~p{E^U)]). (104) 

Stationarity equations are typically nonlinear and have 
to be solved by iteration. A possible iteration scheme is 



.jir+i) ^ v(r)+riA-^ \6y lnp(t;('')) 

+ ^^Sy\np{Xi\x,v'^''^) - SyE^ 



(105) 



Here 77 is a step width which can be optimized by a line 
search algorithm and the positive definite operator A dis- 
tinguishes different learning algorithms. 



6 Numerical examples 

As numerical application of BIQM and to test several vari- 
ants of implementing a priori information we will study 
the reconstruction of an approximately periodic, one-di- 
mensional potential. Such a potential may represent a 
one-dimensional surface where a periodic structure, e.g. 
that of a regular crystal, is distorted by impurities, lo- 
cated at unknown positions and of unknown form. 

To test the quality of reconstruction algorithms, artifi- 
cial data will be sampled from a model with known "true" 
potential Vtmc- Selecting a specific prior model and apply- 
ing the corresponding Bayesian reconstruction algorithm 
to the sampled data, we will be able to compare the recon- 
structed potential with the original one. In particular, we 
will take as true potential the following perturbed periodic 
potential 



VtruAx) 



sin (fa;) 



sin(ffx) 



1 < a; < 12, 25 < a; < 36, 
13 < a; < 24, 

(106) 

using for the numerical calculations a mesh of size 36. 
Considering a system prepared as canonical ensemble the 
potential Wtruo defines a corresponding canonical density 
operator p as given in Eq. (^). Artificial data D can then 
be sampled according to the likelihood model of quantum 
mechanics (|^). For the following examples, n = 200 data 
points representing position measurements have been sam- 
pled using the transformation method ||5^ . In all calcula- 
tions we used periodic boundary conditions for quantum 
mechanical wave functions while the potential v has been 
set to zero at the boundaries. 

We will now discuss the results of a Bayesian recon- 
struction under varying prior models. As first example, 
consider a simple Gaussian prior (O) with negative Lapla- 
cian inverse covariance Kq = — A37 zero refer ence poten- 
tial Wo = 0, and an additional prior factor (102) repre- 
senting a noisy measurement of the average energy. The 
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reconstruction results are shown in Fig. |^. In particular, 
the figure on top compares the reconstructed likelihood 
Pbiqm{x\x, fBiQM) with the true hkelihood {x\x,Vtiuc) 
and with the empirical density, i.e., the relative frequen- 
cies of the sampled data 



1 " 

Pcmpix) ^ - ^S{X - Xi). 



(107) 



Similarly, the lower figure compares the reconstructed po- 
tential wbiqm with the true potential wtruc- Since infor- 
mation on the average energy was available the depth of 
the potential is well approximated at least at one of its 
minima. This is sufhcient to fulfill the noisy average en- 
ergy condition. However, because only smoothness and no 
periodicity information is implemented by the prior the 
reconstructed potential is too flat. The effect is stronger 
near the maxima than near the minima of the potential be- 
cause near the maxima only few data points are available 
and hence the reconstructed potential is there dominated 
by the zero reference potential in the smoothness prior. 

To include information on approximate periodicity we 
have replaced in the next example the zero reference po- 
tential = by the strictly periodic reference potential 



Vq(x) 



(108) 



shown as dashed line in the following figures of poten- 
tials. Areconstruction with the periodic reference poten- 
tial (lOS) but without average energy information, and 
starting the iteration with the reference potential as ini- 
tial guess w'"' = vq is shown in Fig. |[ Due to missing 
average energy information the depth of the potential is 
not well approximated. It is also clearly visible in Fig. || 
that the smoothness prior does not favor solutions which 
are similar to the reference vq itself but solutions which 
have derivatives similar to that of vq . Fig. ^ also displays 
that the reconstruction of the potential does clearly iden- 
tify the impurity. As the reference potential is not adapted 
to the impurity region the reconstruction is there poorer 
than in the regular region. 

Furthermore, it is worth emphasizing that the recon- 
structed likelihood fits the empirical density well, even 
slightly better than the true likelihood does. This is due 
to the flexibility of a nonparametric approach which al- 
lows to fit the fluctuations of the empirical density caused 
by the finite sample size. The effect is well known in em- 
pirical learning and leads to so called "overfitting" if the 
influence of the prior becomes to small. Since observa- 
tional data influence the reconstruction only through the 
likelihood, the reconstruction of potentials is in general a 
more difficult task than the reconstruction of likelihoods. 
This indicates the special importance of a priori informa- 
tion when reconstructing potentials. Indeed, even if the 
complete likelihood is given, the problem of determining 
the potential can still be ill-defined in regions where the 
likelihood is small |57[ |. 

A prior model with periodic reference potential can be 
made more fiexible by adapting amplitude, frequency, and 



phase of the reference potential (108). For this purpose 
one can introduce a hyperparameter vector — (0i, ^2, ^3) 
parameterizing amplitude, frequency, and phase and take 
as reference potential 



vo{x;9) 



ii sm 



27r 



(109) 



The corresponding maximization of the posterior with re- 
spect to 9 is easy in that case and does not change the 
results of Fig. ^ where the hyperparameters are already 
optimally adapted. 

Including an additional noisy energy measurement ( |102| ) 
Fig. H shows that the depth of the potential is indeed bet- 
ter approximated than in Fig. ^. To avoid local maxima 
of the posterior the solution of Fig. ^ has been used as ini- 
tial guess and the factor multiplying the average energy 
term has been slowly increased to its final value. Fig. H 
still only represents a local and no global maximum of the 
posterior, as can be by seen by starting with a different 
initial guess u'"'. In Fig. ^ a better solution for the same 
parameters is presented where the initial guess has been 
selected using a priori information about the location of 
the impurity region. 

Alternatively to a Gaussian prior with periodic refer- 
ence, approximate periodicity can be enforced by the in- 
verse covariance of a Gaussian prior. In this case the prior 
favors periodicity but no special form of the potential. 
The prior is thus less specific than a prior with explicit 
periodic reference function. Corresponding BIQM results 
for the inverse covariance ( |2^ ) are shown in Fig. |^. In- 
deed while the potential is well approximated in regions 
where many observations have been collected, it is not as 
well approximated in regions where no or only few data 
are available. These are the regions where the prior dom- 
inates the observational data. In particular, in the case 
presented in Fig. |^, the zero reference function tip = of 
an additional Laplacian smoothness prior implements a 
tendency to fiat potentials. 

If impurities are expected, a prior with one fixed peri- 
odic reference potential for the whole region is no adequate 
choice. Near impurities one would like to switch off the 
standard periodic reference potential which in these re- 
gions will be misleading. Because it is usually not known 
in advance where a given reference should be used and 
where not, those regions must be identified during learn- 
ing. As first example we study a prior energy similar to 
Eq. (p3|). 



E{v) 



Al 
2 



dx \v{x) - t;o(x)ni - B{x)] -^{v\A\v) 



(110) 



which allows to switch off a given reference locally by 
means of a binary switching function defined as B{x) — 
{\v{x) — vq{x)\'^ — 1?). (An average energy term Ejj = 
^{U — k)^ could easily be included.) In the prior en- 
ergy ( |110| ) the reference vq is only used if \v{x) — vo{x)\'^ 
is smaller than the give n th reshold Starting with a 
smoothed version of Eq. (IIC ) with a real mixing function 
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B{x) ~ a {\v{x) — vo{x)\'^ — -d), the results of Fig. ||have 
been obtained by changing during iteration cr(x) slowly 
from a sigmoid to a step function. Using a step function 
for B directly from the beginning leads to nearly indistin- 
guishable results. Compared to Fig. |^ the reconstruction 
in Fig. ^ is improved mainly in the unperturbed region 
where the algorithm can now use the correct reference po- 
tential. An additional advantage is that the final auxiliary 
field B{x) directly shows the identified impurity regions. 
One sees in Fig. H that the auxiliary field B{x) is always 
switched off if the solution v{x) is similar enough to the 

template vq{x). 

The two w-dependent terms in Eq. ( |llC| ) can be com- 
bined [compare Eqs. ( |6^ ) and (|64|)]. Skipping a term which 
only depends on v through B{x), one arrives at another 
prior which also implements local switching. More general, 
choosing the prior energy (76) for switching between two 
filtered differences with two reference potentials vi and V2 
leads to 



E{v) = ^ Jdx[l-B{x)]\u;,{x)f^ 



+ Y I dxBix)\u;2{x)\^ 



(111) 



where the switching is controlled by the binary function 
B{x) = (|wi(a;)p — |ci;2(a;)P — defined in terms of the 
filtered diffe rences LOi{x) — [d / dx)[v{x) — Vi{x)]. A prior 
energy ( |lll| ) with two different nonzero reference poten- 
tials vi and V2 is obtained, for example, when a different 
nonzero reference potential is given for the unperturbed 
and the perturbed region. The number of changes in the 
switching function B{x) = (|a;i(x)p — |cj2(a;)P — can 
be controlled by adding a prior term p{B) penalizing the 
number of times the function B{x) changes its value. To 
avoid local minima for binary B{x), simulated anneal- 
ing techniques are useful. We have obtained an initial 
guess for u, and thus for B{x), by writing v(x) = [1 — 
c{x)]vi{x) + c{x)v2{x) and optimizing the binary function 
c{x) by simulated annealing with respect to the likelihood 
and the additional prior p(i3). In particular, starting from 
c{x) = 0, new trial functions have been generated by se- 
lecting two points xi, X2 randomly and exchanging the 
function values zero and one in between (see Fig. 0). A 
new trial function has been accepted or rejected using 
the Metropolis rule p(accept) = min[l, exp(— /3ann^-Eann)] 
with Z\£'ann denoting the difference in the error between 
actual function and new trial function. In the present 
case we have -Eann(^') = + Eb{v) where 

E{xi\x,v) = —\np{xi\x,v) and p{B) oc exp(— £'s). The 
annealing temperature 1 / /3ann decreases during optimiza- 
tion. 

Fig. ^ shows the reconstruction results using the fol- 
lowing two reference potentials 



Compared to Fig. ^ the reconstruction is improved in the 
perturbed region, where the algorithm can now rely on a 
useful reference potential. 

Finally, the switching function can be introduced as 
local hyperfield. As an example for a prior with hyperfield. 
Fig. shows the reconstruction with the prior energy 



Al 
2 



v-vo{0)\v-vo{0))- 



X2 

2 



where vo{x;9) — vi{x)[l — 9(x )] +V2{x)9{x 
erence potentials of Eq. (112) and Eq. (113) 



v\A\v)~lnp{0), 

(114) 
with the rcf- 
A hyper- 



prior p{6) has been used penalizing the number of dis- 
continuities of the hyperfield 9{x), analogo us to p{B) for 
Fig. ^. The E(v\9) part of the prior energy (114) is of the 
form ( ^9| ) with ^-independent covariances. Hence the 9- 
independent normalization factor can be skipped. An ini- 
tial guess for the local hyperfield 9{x) has been obtained 
by simulated annealing as described for Fig. |[ As in this 
case optimization is required only with respect to the 9— 
dependent parts of the posterior, optimizing 9{x) for given 
V is faster than optimizing v through c{x) which requires 
diagonalization of the hamiltonian H for every new trial 
function. However, as 9{x) is independent of v, the hyper- 
field has to be updated during iteration which has also 
been done by simulated annealing. As expected, a recon- 
struction with th e no n-Gaussian prior corresponding to 
the prior energy (111) is ve ry si milar to a reconstruction 
using hyperfields as in Eq. ( 114 ). 



Fig. 2. Generation of new trial configurations for simulated 
annealing by selecting two points randomly and exchanging 
the values zero and one of the binary function in between. This 
mechanism has been used to optimize the binary functions c{x) 
and 6(x). 



vi{x) 
V2ix) 



■ sm 



27r 











sin 











(112) 



7 Conclusion 



A nonparametric Bayesian approach has been developed 
^ ' and applied to the inverse problem of reconstructing po- 
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tentials of quantum systems from observational data. Re- 20 
lying on observational data only the problem is typically 
ill-defined. It is therefore essential to include adequate 21 
a priori information. Since reconstructed potentials ob- 
tained by Bayesian Inverse Quantum Mechanics (BIQM) 22 
depend sensitively on the implemented a priori informa- 
tion, flexible prior models are required which can be adapted 
to the specific situation under study. In particular, the use 
of hyperparameters, hyperfields, and non-Gaussian pri- 
ors with auxiliary fields has been discussed in detail. In 
this paper we have focussed on the implementation of ap- 
proximate periodicity for potentials in inverse problems 
of quantum statistics. The presented prior models, how- 25 
ever, can be useful for many empirical learning problems, 
including for example regression or general density esti- 26 
mation. Several variants of implementing a priori infor- 
mation on approximate periodicity have been tested and 
compared numerically. 27, 
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Fig. 3. Gaussian prior with Laplacian inverse covariance, zero 
reference potential, and additional noisy energy measurement. 
Top: Empirical density pomp (bars), true Ukelihood ptruc (thin), 
reconstructed hkelihood pbiqm (thick) Bottom: Reconstructed 
potential ubiqm (thick) and true potential Vtmc (thin). (With 
200 data points, m — 0.25 for ft = 1, /3 = 4. Gaussian prior ( pT| ) 
with inverse covariance Kq = ~XA, A — 0.2, zero reference 
potential vp = 0, and an additional energy penalty term of 
the form (102) with fj, = 1000 and k = —0.330, equal to the 
true average energy U{vtTue). T he so lution has been obtained 
by iterating according to Eq. (105) with A = Kq, starting 
with initial guess w'"' = 0. The optimal step width rj has been 



determined for each iteration by a line search algorithm. 
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Fig. 4. Gaussian prior with periodic reference potential with- 
out noisy energy measurement. Top: Empirical density Pcmp 
(bars), true likelihood ptruc (thin), reconstructed likelihood 
Pbiqm (thick) Bottom: Reconstructed potential ubiqm (thick) . 
true pot enti al wtruc (thin), and reference potential vo (dashed) 
of Eq. (108). (Number of data points and parameters m, /3, 
Ko, and A as for Fig. H but with fi = 0. The solution has been 



obtained by iterating according to (105) as described for Fig.j 
with initial guess w'"' = vq.) 



Fig. 5. Gaussian prior with periodic reference potential and 
additional energy measurement, improving the a ppro ximation 
of the minima. (Reference potential vo given in (108), energy 
penalty term as in (102) with /i — 1000 and k = —0.330. All 
other parameters as for Fig. H. Iterated with the solution shown 
in Fig. 1^ as initial guess v^.) 
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Fig. 6. Gaussian prior with periodic reference potential and 
additional energy measurement, with initial guess v^^^ dif feren t 
from that of Fig. |^. (Refe rence potential «o given in ( |l08| ][^ 
energy penalty term as in (102) All parameters as for Fig. ^ 
Iterated with initial guess v^"' (x) — vo{x) for < a; < 12, 25 < 
X and v^^^x) = for 13 < i < 24.) 
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Fig. 7. Approximate periodicity implemented by an inverse 
covariance Ko = —X{A + 'yAg) as in Eq. (H). (With 7 = 1.0, 
A — 0.2, a fixed 6 = 6, energy penalty term with /i = 1000, and 
zero reference potential «o = 0. Initial guess w'"-* = vo = 0- All 
other parameters as for Fig. ^ ) 



Fig. 8. Local switching between periodic and zero reference 
potential. The black bars on top indicate regions where B{x) 
= 1, i.e .. re gions where impurities have been identified. (Prior 
of Eq. (110) with Ai = 0.2, A2 = 0.2, jj. = 0, and reference po- 
tential as in (108). The u-dependent function B{x) was slowly 
changed from a sigmoid to a step function during iteration, 
keepingthe threshold 1? = 0.15 fixed. AH other parameters as 
in Fig.^. Initial guess t;''^^ as for Fig. ^ 
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Fig. 9. Local switching between two nonze ro re ference 
tentials. (Reference potentials v\, V2 given in and 
Prior of Eq. (Ill), with Ai = A2 = 10, n = 0. Step function 
for B{x) with 1? = 0. An additional prior p{B) on B has been 
included with — lnp(_B)/10 counting the number of disconti- 
nuities of the function B{x). Other parameters as in Fig. M.) 



. 15 



. 1 



. 05 




5 10 15 20 25 30 35 




Fig. 10. Prior with local hyperfield. (Prior of Eq. (114), with 
Al = 10, X2 ~ 1, ~ 0, jj, = 0, including a hyperprior p{9) with 
Eb/ 10 counting the number of discontinuities of the hyperfield 
9(x). Other parameters as in Fig. ^) 



